<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
       
  <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
  <title>BNF Converter C Mode</title>
</head>
  <body>
   
<div style="text-align: center; ">  
<h2>BNF Converter</h2>
   
<h2>C Mode</h2>
  </div>
    
<h3>By Michael Pellauer</h3>
   
<h3>2003</h3>
  One of the best features of the LBNF file format is that it is completely 
independent of the target programming language. The new C mode allows the 
BNF Converter to output a compiler front end in C using Flex and Bison. In 
theory it will also work with Lex and YACC as well, although this has not 
been thoroughly tested.<br>
  <br>
  BNFC C mode has been tested to work with Flex version 2.5.4 and Bison version 1.875.<br>
  <br>
  BNF Converter Homepage:<br>
  <a href="http://www.cs.chalmers.se/%7Emarkus/BNFC/">http://www.cs.chalmers.se/~markus/BNFC/</a>
 <br>
  <br>
  Flex Homepage:<br>
  <a href="http://www.gnu.org/software/flex/flex.html">http://www.gnu.org/software/flex/flex.html</a>
 <br>
  <br>
  Bison Homepage:<br>
  <a href="http://www.gnu.org/software/bison/bison.html">http://www.gnu.org/software/bison/bison.html</a>
 <br>
  <br>
   
<h3>USAGE<br>
  </h3>
   
<div style="margin-left: 40px; "><big><span style="font-family: monospace; ">
 bnfc [-m] -cpp FILENAME.cf</span></big><br style="font-family: monospace; ">
  </div>
  <br>
  To access the C mode simply pass the -c command to bnfc. <br>
  The result will be the following files:<br>
  <br>
   
<table cellpadding="2" cellspacing="2" border="0" style="text-align: left; width: 100%; ">
    <tbody>
      <tr>
        <td style="vertical-align: top; text-decoration: underline; "><big><span style="font-family: monospace; ">
 Filename:</span></big></td>
        <td style="vertical-align: top; text-decoration: underline; "><big><span style="font-family: monospace; ">
 Description:</span></big></td>
      </tr>
      <tr>
        <td style="vertical-align: top; "><big><span style="font-family: monospace; ">
 Absyn.h, Absyn.c</span></big><big><span style="font-family: monospace; "><br>
        </span></big><big><span style="font-family: monospace; ">FILENAME.l<br>
        </span></big><big><span style="font-family: monospace; ">FILENAME.y<br>
  Pri</span></big><big><span style="font-family: monospace; ">nter.h, Printer.c<br>
        </span></big><big><span style="font-family: monospace; ">Skeleton.h,
 Skeleton.c<br>
  Parser.h<br>
        </span></big><big><span style="font-family: monospace; ">Test.c<br>
        </span></big><big><span style="font-family: monospace; ">Makefile</span></big><br>
        </td>
        <td style="vertical-align: top; "><big><span style="font-family: monospace; ">
 Abstract Syntax tree interface &amp;&amp; structs</span></big><big><span style="font-family: monospace; "></span></big><br>
        <big><span style="font-family: monospace; ">Flex file (lexer specification)<br>
  Bison</span></big><big><span style="font-family: monospace; "> file (parser 
specification)<br>
        </span></big><big><span style="font-family: monospace; ">A Pretty
Printer for the abstract syntax tree.<br>
        </span></big><big><span style="font-family: monospace; ">A code skeleton 
for traversing the syntax tree.<br>
        </span></big><big><span style="font-family: monospace; ">Contains
definitions of tokens for the lexer.<br>
  A testbench to test the final result.<br>
        </span></big><big><span style="font-family: monospace; ">A Makefile 
(generated only with the -m flag).</span></big><br>
        </td>
      </tr>
       
  </tbody>  
</table>
  <br>
  <br>
   
<h3>Compiling the Compiler Front End</h3>
  You can use the generated Makefile to compile the generated C Code with 
"make" ("gmake" on some systems).<br>
  <br>
  If all goes well the following files should be generated:<br>
  <br>
   
<table cellpadding="2" cellspacing="2" border="0" style="text-align: left; width: 100%; ">
    <tbody>
      <tr>
        <td style="vertical-align: top; font-family: monospace; text-decoration: underline; "><big>
 File:</big></td>
        <td style="vertical-align: top; font-family: monospace; text-decoration: underline; "><big>
 Description:</big></td>
      </tr>
      <tr>
        <td style="vertical-align: top; font-family: monospace; "><big>Absyn.o<br>
  Lexer.c<br>
  Lexer.o<br>
  Parser.c<br>
  Parser.o<br>
  Printer.o<br>
  Test.o<br>
  testFILENAME<br>
        </big></td>
        <td style="vertical-align: top; font-family: monospace; "><big> Abstract 
Syntax files<br>
  Lexer generated by Flex<br>
  Compiled Lexer<br>
  Parser generated by Bison<br>
  Compiled parser<br>
  Compiled Pretty Printer<br>
  Compiled Test Bench<br>
  Compiled executable of all .o files<br>
        </big></td>
      </tr>
       
  </tbody>  
</table>
   
<h3>Testing the Compiler Front End<br>
  </h3>
  The executable <span style="font-family: monospace; font-weight: bold; ">
 testFILENAME</span> (where <span style="font-family: monospace; font-weight: bold; ">
 FILENAME</span> is the name of the <span style="font-family: monospace; font-weight: bold; ">
 .cf</span> file) may be used to test the result of the generation. To use 
it simply give it the name of a file written in the target language. If parsing 
is correct, the abstract syntax tree (in Haskell-like syntax) and linearized 
pretty-printed result will be shown.<br>
  <br>
  For example, if we had created a subset of Java called Javalette, in <span style="font-family: monospace; font-weight: bold; ">
 Javalette.cf</span>:<br>
  <br>
  <big><span style="font-family: monospace; font-weight: bold; ">&gt; ./testJavalette 
helloworld.jl</span><br style="font-family: monospace; ">
  <br style="font-family: monospace; ">
  <span style="font-family: monospace; ">Parse Successful!</span><br style="font-family: monospace; ">
  <br style="font-family: monospace; ">
  <span style="font-family: monospace; ">[Abstract Syntax]</span><br style="font-family: monospace; ">
  <br style="font-family: monospace; ">
  <span style="font-family: monospace; ">(PDefList [(FuncDef TInt(FFuncName 
"main")[][(SPrintString "Hello World"),</span><span style="font-family: monospace; ">
  (SReturn NoExpression)]NoSemicolon)])</span><br style="font-family: monospace; ">
  <br style="font-family: monospace; ">
  <span style="font-family: monospace; ">[Linearized Tree]</span><br style="font-family: monospace; ">
  <br style="font-family: monospace; ">
  <span style="font-family: monospace; ">int main () </span><br style="font-family: monospace; ">
  <span style="font-family: monospace; ">{</span><br style="font-family: monospace; ">
  <span style="font-family: monospace; ">&nbsp; printString ("Hello World") 
;</span><br style="font-family: monospace; ">
  <span style="font-family: monospace; ">&nbsp; return ;</span><br style="font-family: monospace; ">
  <span style="font-family: monospace; ">&nbsp; </span><br style="font-family: monospace; ">
  <span style="font-family: monospace; ">}</span><br style="font-family: monospace; ">
  </big><br>
  If no argument is given then the executable attempts to read from stdin.<br>
  <br>
   
<h3>The Abstract Syntax Tree</h3>
  The Abstract Syntax tree is generated following the method Andrew Appel
outlines in<br>
  "Modern Compiler Construction in C"<br>
  <a href="http://www.cs.princeton.edu/%7Eappel/modern/c/">http://www.cs.princeton.edu/~appel/modern/c/</a>
 <br>
  <br>
  The generated code closely follows Appels methodology, where each struct
 represents an LBNF category. Labels are represented by a kind field, and 
each kind has a field in a union, with the field name based on the LBNF label 
name. <br>
  Here is an example. Let us say that we have made a simple LBNF file to
describe lists of boolean expressions, called <span style="font-family: monospace; font-weight: bold; ">
 BoolExp.cf</span>:<br>
  <br>
  <big><span style="font-family: monospace; ">--Begin LBNF file</span><br style="font-family: monospace; ">
  <br style="font-family: monospace; ">
  <span style="font-family: monospace; ">PROG.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 PROG&nbsp;&nbsp; ::= [EXP] ;</span><br style="font-family: monospace; ">
  <br style="font-family: monospace; ">
  <span style="font-family: monospace; ">OrExp.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 EXP&nbsp;&nbsp;&nbsp;&nbsp; ::= EXP "||" EXP1 ;</span><br style="font-family: monospace; ">
  <span style="font-family: monospace; ">AndExp.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 EXP1&nbsp;&nbsp;&nbsp; ::= EXP1 "&amp;&amp;" EXP2 ;</span><br style="font-family: monospace; ">
  <span style="font-family: monospace; ">TVal.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 EXP2&nbsp;&nbsp;&nbsp; ::= "true" ;</span><br style="font-family: monospace; ">
  <span style="font-family: monospace; ">FVal.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 EXP2&nbsp;&nbsp;&nbsp; ::= "false" ;<br>
  </span><br style="font-family: monospace; ">
  <span style="font-family: monospace; ">separator EXP ";" ;</span><br style="font-family: monospace; ">
  <span style="font-family: monospace; ">coercions EXP 2 ;</span><br style="font-family: monospace; ">
  <br style="font-family: monospace; ">
  <span style="font-family: monospace; ">--LBNF file ends here</span></big><br>
  <br>
  Absyn.H&nbsp; will define the following classes:<br>
<br>
<big><tt><font color="#3333ff">struct</font> PROG_<br>
 {<br>
 &nbsp; <font color="#3333ff">enum</font> { is_PROG } kind;<br>
 &nbsp; <font color="#3333ff">union</font><br>
 &nbsp; {<br>
 &nbsp;&nbsp;&nbsp; <font color="#3333ff">struct</font> { ListEXP listexp_;
} program_;<br>
 &nbsp; } u;<br>
 };<br>
<font color="#3333ff"> typedef</font> <font color="#3333ff">struct</font>
 PROG_ *PROG;<br>
 struct EXP_<br>
 {<br>
 &nbsp; <font color="#3333ff">enum</font> { is_EOr, is_EAnd, is_ETrue, is_EFalse,
is_EVar } kind;<br>
 &nbsp; <font color="#3333ff">union</font><br>
 &nbsp; {<br>
 &nbsp;&nbsp;&nbsp; struct { EXP exp_1, exp_2; } eor_;<br>
 &nbsp;&nbsp;&nbsp; struct { EXP exp_1, exp_2; } eand_;<br>
 &nbsp;&nbsp;&nbsp; struct { Ident ident_; } evar_;<br>
 &nbsp; } u; <br>
 };<br>
<font color="#3333ff"> typedef</font> <font color="#3333ff">struct</font>
 EXP_ *EXP;<br>
 <font color="#3333ff">struct</font> ListEXP_<br>
 {<br>
 &nbsp; EXP exp_;<br>
 &nbsp; ListEXP listexp_;<br>
 };<big><br style="font-family: monospace; ">
  <span style="font-family: monospace; "></span></big>typedef <font color="#3333ff">
struct</font> ListEXP_ *ListEXP;</tt></big><br>
 <br>
  Note that as in Appel's method, the actual struct name ends with an underscore, 
and a typedef removes the need to specify pointers with *. Singular categories 
such as <big><span style="font-family: monospace; ">PROG</span></big> maintain 
their kind field for simplicity even though it has no other possible alternatives.<br>
 <br>
 BNFC generates constructor functions to create structs with a specific kind. 
For example, for the constructors for the EXP struct above:<br>
 &nbsp;<big><tt><br>
 EXP make_EOr(EXP p1, EXP p2)<br>
 {<br>
 &nbsp; EXP tmp = (EXP) <b>malloc</b>(<b>sizeof</b>(*tmp));<br>
 &nbsp; <b>if</b> (!tmp)<br>
 &nbsp; {<br>
 &nbsp;&nbsp;&nbsp; fprintf(stderr, <font color="#33cc00">"Error: out of
memory when allocating EXP!\n"</font>);<br>
 &nbsp;&nbsp;&nbsp; exit(1);<br>
 &nbsp; }<br>
 &nbsp; tmp-&gt;kind = is_EOr;<br>
 &nbsp; tmp-&gt;u.eor_.exp_1 = p1;<br>
 &nbsp; tmp-&gt;u.eor_.exp_2 = p2;<br>
&nbsp; <b>return</b> tmp;<br>
 }<br>
 ...<br>
 EXP make_ETrue()<br>
 {<br>
 &nbsp; EXP tmp = (EXP) malloc(sizeof(*tmp));<br>
 &nbsp; if (!tmp)<br>
 &nbsp; {<br>
 &nbsp;&nbsp;&nbsp; fprintf(stderr, <font color="#33cc00">"Error: out of
memory when allocating EXP!\n"</font>);<br>
 &nbsp;&nbsp;&nbsp; exit(1);<br>
 &nbsp; }<br>
 &nbsp; tmp-&gt;kind = is_ETrue;<br>
&nbsp; <b>return</b> tmp;<br>
 }<br>
 ...</tt></big><big> </big> <br>
 <br>
  <span style="font-weight: bold; "></span>The <big><span style="font-family: monospace; ">
 ListEXP</span></big> struct is automatically generated to represent the
list of <big><span style="font-family: monospace; ">EXP</span></big>. This
is significantly different than the standard BNF Haskell mode, which usesHaskell's
built-in lists. Currently for each class <span style="font-style: italic; ">
NAME</span>  that can be held in a list, a struct List<span style="font-style: italic; ">
 NAME</span> is generated. These are straightforward null-terminated linked 
lists and should be familiar to all C programmers. In the future this feature
 could be extended to C++'s Standard Template Library.<br>
  <br>
  For example, here is the constructor for <big><span style="font-family: monospace; ">
 ListEXP</span></big>:<br>
  <big><tt><big><br>
 </big>ListEXP make_ListEXP(EXP p1, ListEXP p2)<br>
 {<br>
 &nbsp; ListEXP tmp = (ListEXP) <b>malloc</b>(<b>sizeof</b>(*tmp));<br>
 &nbsp; <b>if</b> (!tmp)<br>
 &nbsp; {<br>
 &nbsp;&nbsp;&nbsp; fprintf(stderr, <font color="#33cc00">"Error: out of
memory when allocating ListEXP!\n"</font>);<br>
 &nbsp;&nbsp;&nbsp; exit(1);<br>
 &nbsp; }<br>
 &nbsp; tmp-&gt;exp_ = p1;<br>
 &nbsp; tmp-&gt;listexp_ = p2;<br>
 &nbsp; <b>return</b> tmp;<br>
 }<big><span style="font-family: monospace; "><span style="color: rgb(51,51,255); ">
 </span> </span></big></tt></big><big><span style="font-family: monospace; ">
 </span><br style="font-family: monospace; ">
  </big><br style="font-family: monospace; ">
  The programmer can then traverse through the abstract syntax tree using 
the generated Skeleton files (see below). The PrettyPrinter is an example 
of this. It contains two functions that traverse the tree and return Strings. 
The "show" functions print a Haskell-like view of the data type names. The 
"print" function is a pretty-printer that uses simple heuristics (easily changed
in the functions "renderC" and "renderS") to linearize the syntax tree.<br>
  <br>
  We can see the PrettyPrinter class at work in the automatically generated 
file <b><tt>Test.c</tt></b>. Continuing our early example, let us write a 
simple input file in our language of boolean expressions:<br>
  <br>
  <big><span style="font-family: monospace; ">true &amp;&amp; (false || true);</span><br style="font-family: monospace; ">
  <span style="font-family: monospace; ">true;</span><br style="font-family: monospace; ">
  <span style="font-family: monospace; ">false || false</span><br style="font-family: monospace; ">
  </big><br>
  We can now test our parser using the Test class:<br>
  <br style="font-weight: bold; ">
  <big><span style="font-family: monospace; font-weight: bold; ">&gt; ./testBoolExp 
testfile</span><br style="font-family: monospace; ">
  <br style="font-family: monospace; ">
  <span style="font-family: monospace; ">Parse Successful!</span><br style="font-family: monospace; ">
  <br style="font-family: monospace; ">
  <span style="font-family: monospace; ">[Abstract Syntax]</span><br style="font-family: monospace; ">
  <br style="font-family: monospace; ">
  <span style="font-family: monospace; ">(PROG [(AndExp TVal(OrExp FValTVal)), 
TVal, (OrExp FValFVal)])</span><br style="font-family: monospace; ">
  <br style="font-family: monospace; ">
  <span style="font-family: monospace; ">[Linearized Tree]</span><br style="font-family: monospace; ">
  <br style="font-family: monospace; ">
  <span style="font-family: monospace; ">true &amp;&amp; (false || true)
;</span><br style="font-family: monospace; ">
  <span style="font-family: monospace; ">true ;</span><br style="font-family: monospace; ">
  <span style="font-family: monospace; ">false || false </span></big><br>
  <br>
   
<h3>Using the Skeleton File</h3>
  Now that we have sucessfully constructed a parser, it's time to do something 
useful with it. This generally involves traversing the Abstract Syntax tree 
and producing a new data structure, such as translating from a high-level 
language to assembly code. To assist the programmer with this process the 
BNF Converter's C++ Mode provides the files <span style="font-family: monospace; font-weight: bold; ">
 Skeleton.h</span> and <span style="font-family: monospace; font-weight: bold; ">
 Skeleton.c</span><br>
  <br>
  This file is an empty skeleton that traverses the abstract syntax tree
(currently doing nothing interesting). It uses simple switch statements to
traverse the tree based on the kind field of the struct. For example, here
is the code that would be generated for EXP, above:<big><tt><br>
 <br>
 <font color="#3333ff">void</font> visitEXP(EXP _p_)<br>
 {<br>
 &nbsp; <font color="#3333ff">switch</font>(_p_-&gt;kind)<br>
 &nbsp; {<br>
 &nbsp; <font color="#3333ff">case</font> is_EOr:<br>
 &nbsp;&nbsp;&nbsp; <font color="#cc0000">/* Code for EOrGoes Here */</font><br>
 &nbsp;&nbsp;&nbsp; visitEXP(_p_-&gt;u.eor_.exp_1);<br>
 &nbsp;&nbsp;&nbsp; visitEXP(_p_-&gt;u.eor_.exp_2);<br>
 &nbsp;&nbsp;&nbsp; <font color="#3333ff">break;</font><br>
 &nbsp; <font color="#3333ff">case</font> is_EAnd:<br>
 &nbsp;&nbsp;&nbsp; <font color="#cc0000">/* Code for EAndGoes Here */</font><br>
 &nbsp;&nbsp;&nbsp; visitEXP(_p_-&gt;u.eand_.exp_1);<br>
 &nbsp;&nbsp;&nbsp; visitEXP(_p_-&gt;u.eand_.exp_2);<br>
 &nbsp;&nbsp;&nbsp; <font color="#3333ff">break;</font><br>
 &nbsp; <font color="#3333ff">case</font> is_ETrue:<br>
 &nbsp;&nbsp;&nbsp; <font color="#cc0000">/* Code for ETrueGoes Here */</font><br>
 &nbsp;&nbsp;&nbsp; break;<br>
 &nbsp; <font color="#3333ff">case</font> is_EFalse:<br>
 &nbsp;&nbsp;&nbsp; <font color="#cc0000">/* Code for EFalseGoes Here */</font><br>
 &nbsp;&nbsp;&nbsp; <font color="#3333ff">break;</font><br>
 &nbsp; <font color="#3333ff">case</font> is_EVar:<br>
 &nbsp;&nbsp;&nbsp; <font color="#cc0000">/* Code for EVarGoes Here */</font><br>
 &nbsp;&nbsp;&nbsp; visitIdent(_p_-&gt;u.evar_.ident_);<br>
 &nbsp;&nbsp;&nbsp; <font color="#3333ff">break;</font><br>
 &nbsp; <font color="#3333ff">default:</font><br>
 &nbsp;&nbsp;&nbsp; fprintf(stderr, "Error: bad kind field when printing
EXP!\n");<br>
 &nbsp;&nbsp;&nbsp; exit(1);<br>
 &nbsp; }<br>
 }</tt></big><br>
 <br>
 To use the skeleton the programmer must generally do the following:<br>
   
<ul>
    <li>Copy <span style="font-family: monospace; font-weight: bold; ">Skeleton.h</span>
  and <b><span style="font-family: monospace; ">Skeleton.c</span></b> to
a new file</li>
    <li>Change the name of the functions from <tt>visit</tt> to something
meaningful.</li>
   <li>Change the parameters and return types of the <tt>visit </tt>functions 
with a search-and-replace.<br>
   </li>
    <li>Add some variables and a top level method to begin the transformation.</li>
   
</ul>
  &nbsp;From this point it should be fairly straightforward to implement
the  desired algorithm. Good examples of it can be found in the two functions
in <span style="font-family: monospace; font-weight: bold; ">Printer.c</span><br>
    
<h3>Known Issues<br>
  </h3>
  Here is a list of known issues with the BNF Converter's C Mode.<br>
  <br>
  <span style="font-weight: bold; text-decoration: underline; ">Keywords</span><br>
  <br>
  C contains many more keywords than Haskell, and currently there is nothing 
to prevent a programmer from using these keywords in their LBNF grammar. For
example:<br>
  <br>
  <big><span style="font-family: monospace; ">Static.&nbsp;&nbsp; Typedef 
&nbsp; ::= Sychronized "." Switch ;</span></big><br>
  <br>
  While this will be accepted by BNF it might result in un-compilable code, 
as the generated C occaisionally uses lowercase names for instance variables. 
As such programmers should generally avoid matching Label names with C keywords.<br>
  <br>
  <br>
  <span style="font-weight: bold; text-decoration: underline; ">Reuse of
Label Names</span><br>
  <br>
  Reusing label names currently results in a warning from BNFC. However,
for C this is a much more serious issue, as they are used for enumerated
types. For example:<br>
  <big><tt><br>
 <font color="#3333ff">struct</font> FOO_<br>
 {<br>
 &nbsp; <font color="#3333ff">enum </font>{ is_BAR...} kind;<br>
 ...<br>
 <font color="#3333ff">struct</font> BAZ_<br>
 &nbsp; enum { is_BAZ... <font color="#cc0000">/* Whoops! The compiler won't 
like this! */</font></tt></big><br>
 <br>
  The best solution is for the programmer to ensure that all label names
are unique and not ignore the BNFC warning.<br>
  <br>
   
</body>
</html>
